MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.
نویسندگان
چکیده
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.
منابع مشابه
Peptide de novo sequencing of mixture tandem mass spectra
The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches ar...
متن کاملiProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates.
The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines...
متن کاملFalse discovery rates of protein identifications: a strike against the two-peptide rule.
Most proteomics studies attempt to maximize the number of peptide identifications and subsequently infer proteins containing two or more peptides as reliable protein identifications. In this study, we evaluate the effect of this "two-peptide" rule on protein identifications, using multiple search tools and data sets. Contrary to the intuition, the "two-peptide" rule reduces the number of protei...
متن کاملIntegrating SQL Databases with Content-Specific Search Engines
In recent years, database research and product development activities have focused on support for non-traditional data types, such as text or multi-media documents. This paper describes an approach of coupling SQL databases and content-specific search engines, such as fulltext retrieval engines, in an efficient manner. It is based on a query rewrite scheme that exploits so-called table function...
متن کاملXLSearch: a Probabilistic Database Search Algorithm for Identifying Cross-Linked Peptides.
Chemical cross-linking combined with mass spectrometric analysis has become an important technique for probing protein three-dimensional structure and protein-protein interactions. A key step in this process is the accurate identification and validation of cross-linked peptides from tandem mass spectra. The identification of cross-linked peptides, however, presents challenges related to the exp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of proteome research
دوره 10 7 شماره
صفحات -
تاریخ انتشار 2011